A.5. Multi-Byte Characters

Characters from East Asian languages such as Chinese, Japanese and Korean (CJK) cannot be represented using 8-bit text like many European languages. CJK character sets typically use multi-byte variable-length encodings such as UTF-8.

The following illustrates some of the multi-byte characters that can be used to test when dealing with test cases for multi-byte characters:

Chinese Characters:

U+5317 U+4EAC Beijing

Japanese Characters:

U+3042 Hiragana Letter A
U+3044 Hiragana Letter I
U+3046 Hiragana Letter U
U+3048 Hiragana Letter E
U+304A Hiragana Letter O

Korean Characters:

U+1100 Latin characters k/g
U+1105 Latin characters r/l

Implementations should use UTF-8 encoding form as a default character set as recommended by iCalendar to guarantee correct display of multi-byte characters (such as CJK languages). However, mobile devices may wish to use specific character sets for the market the device is being sold within (e.g. Many Japanese phones use Shift-JIS exclusively). Regardless of the implementation however users never want to see “%^($%^^##@???”